Mathematics Variance and Standard Deviation

Topics covered

`star` Limitations of mean deviation
`star` Variance and Standard Deviation
`star` Standard Deviation
`star` Standard deviation of a discrete frequency distribution
`star` Standard deviation of a continuous frequency distribution
`star` Shortcut method to find variance and standard deviation

Limitations of mean deviation

`color{green} ✍️` In a series, where the degree of variability is very high, the median is not a representative central tendency.

`color{green} ✍️` Thus, the mean deviation about median calculated for such series can not be fully relied.

`color{green} ✍️` The sum of the deviations from the mean (minus signs ignored) is more than the sum of the deviations from median.

`color{green} ✍️` Therefore, the mean deviation about the mean is not very scientific.Thus, in many cases, mean deviation may give unsatisfactory results.

`color{green} ✍️` Also mean deviation is calculated on the basis of absolute values of the deviations and therefore, cannot be subjected to further algebraic treatment.

`color{green} ✍️` This implies that we must have some other measure of dispersion. Standard deviation is such a measure of dispersion.

Variance and Standard Deviation

`color{green} ✍️` Recall that while calculating mean deviation about mean or median, the absolute values of the deviations were taken. The absolute values were taken to give meaning to the mean deviation, otherwise the deviations may cancel among themselves.

`color{green} ✍️` Another way to overcome this difficulty which arose due to the signs of deviations, is to take squares of all the deviations.

`color{green} ✍️` Obviously all these squares of deviations are non-negative. Let `color{navy}(x_1, x_2, x_3, ..., x_n)` be `n` observations and `color{navy}(barx)` be their mean.

Then `color{navy}((x_1-barx)^2 +(x_2-barx)^2 = sum_(i=1)^(n) (x_i-barx)^2)`

`color{green} ✍️` If this sum is zero, then each `color{navy}((x_i barx))` has to be zero.

`color{green} ✍️` This implies that there is no dispersion at all as all observations are equal to the mean `color{navy}(barx)`

`color{green} ✍️` If `color{navy}(sum_(i=1)^(n) (x_i-barx)^2)` is small , this indicates that the observations `color{navy}(x_1, x_2, x_3, ..., x_n)` close to the mean `barx` and therefore, there is a lower degree of dispersion.

`color{green} ✍️` On the contrary, if this sum is large, there is a higher degree of dispersion of the observations from the mean `color{navy}(barx)` Can we thus say that the sum `color{navy}(sum_(i=1)^(n) (x_i-barx)^2)` is a reasonable indicator of the degree of dispersion or scatter?

`color(red)(=>"Let us take the set A of six observations 5, 15, 25, 35, 45, 55." )`

The mean of the observations is `color{navy}(barx=30)` The sum of squares of deviations from `barx` for this set is

`color{navy}(sum_(i=1)^(6) (x_i-barx)^2 = (5-30)^2 +(15-30)^2 + (25-30)^2 + (35-30)^2 + (45-30)^2 + (55-30)^2)`

`" " color{navy}(= 625 + 225 + 25 + 25 + 225 + 625 = 1750)`

`color(red)(=>"Let us now take another set B of 31 observations")`

`color{navy}(15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)`.

The mean of these observations is `color{navy}(bary = 30)`

Note that both the sets `A` and `B` of observations have a mean of `30.`

Now, the sum of squares of deviations of observations for set B from the mean `color{navy}(bary)` is given by

`color{navy}(sum_(i=1)^(31) (y_i-bary)^2 = (15–30)^2 +(16–30)^2 + (17–30)^2 + ...+ (44–30)^2 +(45–30)^2)`

`" " color{navy}(= (–15)^2 +(–14)^2 + ...+ (–1)^2 + 0^2 + 1^2 + 2^2 + 3^2 + ...+ 14^2 + 15^2)`

`" " color{navy}(= 2 [15^2 + 14^2 + ... + 1^2])`

`" "color{navy}(= 2 xx (15xx(15+1)(30+1))/6 = 5 × 16 × 31 = 2480)`

(Because sum of squares of first `n` natural numbers `color{navy}(= (n(n+1)(2n+1))/6)` (Here `color{navy}(n = 15)`)

If `color{navy}(sum_(i=1)^(n) (x_i-barx)^2)` is simply our measure of dispersion or scatter about mean, we will tend to say that the set `A` of six observations has a lesser dispersion about the mean than the set `B` of `31` observations, even though the observations in set `A` are more scattered from the mean (the range of deviations being from `–25` to `25`) than in the set `B` (where the range of deviations is from `–15` to `15`).

This is also clear from the following diagrams.



Thus, we can say that the sum of squares of deviations from the mean is not a proper measure of dispersion. To overcome this difficulty we take the mean of the squares of the deviations, i.e., we take `color{navy}(1/n sum_(i=1)^(n) (x_i-barx)^2)` In case of the set A, we have

Mean `color{navy}(= 1/6 xx 1750= 291.67)` and in case of the set B, it is `color{navy}(1/31 xx 2480 = 80)`

This indicates that the scatter or dispersion is more in set `A` than the scatter or dispersion in set `B`, which confirms with the geometrical representation of the two sets.

`color{green} ✍️` Thus, we can take `color{navy}(1/n sum_(i=1)^(n) (x_i-barx)^2)` as a quantity which leads to a proper measure of dispersion.

`color{green} ✍️` This number, i.e., mean of the squares of the deviations from mean is called the variance and is denoted by `color{navy}(σ^2)` (read as sigma square).

Therefore, the variance of `n` observations `color{navy}(x_1, x_2,........, x_n)` is given by

`" " color{red}(σ^2) = 1/n sum_(i=1)^(n) (x_i-barx)^2`
Q 3156167974

Find the Variance of the following data:
6, 8, 10, 12, 14, 16, 18, 20, 22, 24

Solution:

From the given data we can form the following Table 15.7. The mean is
calculated by step-deviation method taking 14 as assumed mean. The number of
observations is n = 10

Therefore Mean `bar x =` assumed mean 1+ (sum_(i=1)^n d_i)/n xx h= 14 +5/10 xx 2 =15`

and Variance `(σ^2 ) = `/n sum_(i=1)^10 (x_i - bar x )^2 =1/10 xx 330 =33`

Thus Standard deviation `(σ ) = sqrt (33) = 5.74`

Standard Deviation

`color{green} ✍️` In the calculation of variance, we find that the units of individual observations `x_i` and the unit of their mean `color{navy}(barx)` are different from that of variance, since variance involves the sum of squares of `color{navy}((x_i barx ))`.

For this reason, the proper measure of dispersion about the mean of a set of observations is expressed as positive square-root of the variance and is called standard deviation.

Therefore, the standard deviation, usually denoted by `color{navy}(σ)` , is given by

`color{blue}(σ = sqrt(1/n sum_(i=1)^(n) (x_i-barx)^2))`

Standard deviation of a discrete frequency distribution

`color{green} ✍️` Let the given discrete frequency distribution be

`color{navy}(x : " " x_1 " " x_2" " x_3 ,. . . , x_n)`

`color{navy}(f : " " f_1 " " f_2" " f_3 ,. . . , f_n)`

In this case standard deviation `color{blue}(σ = sqrt(1/N sum_(i=1)^(n) f_i (x_i-barx)^2))`

where `color{navy}(N= sum_(i=1)^(n) f_i)`
Q 3116167979

Find the variance and standard deviation for the following data:

Solution:

Presenting the data in tabular form (Table 15.8), we get

`N = 30 , sum_(i=1)^7 f_i x_i = 420 , sum_(i=1)7 f_i (x_i - bar x )^2 =1374`

Therefore ` bar x = (sum_(i=1)^7 f_i x_i )/N=1/30 xx 420 =14`

Hence variance ` (σ^2 )=1/N sum_(i=1)^7 f_i (x_i - bar x )^2`

`=1/30 xx 1374 =45.8`

and Standard deviation (σ )= `sqrt (45.8) = 6.77`

Standard deviation of a continuous frequency distribution

The given continuous frequency distribution can be represented as a discrete frequency distribution by replacing each class by its mid-point.

Then, the standard deviation is calculated by the technique adopted in the case of a discrete frequency distribution.

If there is a frequency distribution of n classes each class defined by its mid-point `color{navy}(x_i)` with frequency `color{navy}(f_i,)` the standard deviation will be obtained by the formula

`" " color{navy}(σ = sqrt(1/N sum_(i=1)^(n)f_i (x_i-barx)^2))`

where `color{navy}(barx)` is the mean of the distribution and `color{navy}(N= sum_(i=1)^(n) f_i)`

`color(green)(ul"Another formula for standard deviation")` We know that

Variance `color{green}((σ^2) = 1/N f_i sum_(i=1)^(n) (x_i-barx)^2 = 1/N sum_(i=1)^(n) f_i (x_(i)^(2) + barx^2 - 2barx x_i))`

` " " color{}(= 1/N [sum_(i=1)^(n) f_ix_(i)^(2) + sum_(i=1)^(n) barx^2 f_i - sum_(i=1)^(n) 2barxf_ix_i])`

` " " color{}( = 1/N [sum_(i=1)^(n) f_ix_(i)^(2) + barx^2sum_(i=1)^(n) f_i - 2barxsum_(i=1)^(n)f_ix_i])`

` " " color{}(= 1/N [sum_(i=1)^(n)f_ix_i +barx^2N - 2barx.Nbarx]`

`" " [Here 1/N sum_(i=1)^(n)x_i f_i= barx or sum_(i=1)^(n)x_i f_i = Nbarx]`

` " " color{}(= 1/N sum_(i=1)^(n)f_i x_(i)^(2) +barx^2 - 2barx^2 `

`" " = 1/N sum_(i=1)^(n)f_i x_(i)^(2)-barx^2`

or `color{navy}(sigma^2 = 1/N sum_(i=1)^(n)f_ix_(i)^(2) - ((sum_(i=1)^(n)f_ix_(i))/N)^2 `

`" " = 1/(N^2) [ N sum_(i=1)^(n)f_ix_(i)^(2) - (sum_(i=1)^(n)f_ix_(i))^2])`

Thus, standard deviation `color{red}((sigma) = 1/N sqrt(Nsum_(i=1)^(n)f_i x_(i)^(2) - (sum_(i=1)^(n)f_ix_i)^2)`
Q 3186178077

Calculate the mean, variance and standard deviation for the following
distribution :
Class 30-40 40-50 50-60 60-70 70-80 80-90 90-100
Frequency 3 7 12 15 8 3 2

Solution:

From the given data, we construct the following Table 15.9.

Thus Mean `bar x = 1/N sum_(i=1)^7 f_i x_i = 3100/50 = 62`

Variance `(σ^2 )=1/N sum_(i=1)^7 f_i (x_i -bar x)^2`

` =1/50 xx 10050 = 201`

and Standard deviation (σ ) ` = sqrt (201) =14.18`
Q 3146278173

Find the standard deviation for the following data :

Solution:

Let us form the following Table 15.10:

Now, by formula (3), we have

` σ =1/N sqrt(N sum f_i x_(i)^2 - (sum f_i x_i)^2)`

`=1/48 sqrt (48 xx 9652 - (614)^2 )`

`= 1/48 sqrt (463296-376996)`

`=1/48 xx 293.77 = 6.12`

Therefore, Standard deviation (σ ) = 6.12

Shortcut method to find variance and standard deviation

Sometimes the values of `color{navy}(x_i)` in a discrete distribution or the mid points xi of different classes in a continuous distribution are large and so the calculation of mean and variance becomes tedious and time consuming.

By using step-deviation method, it is possible to simplify the procedure.

Let the assumed mean be ‘`A’` and the scale be reduced to `color{navy}(1//h)` times (`h` being the width of class-intervals).

Let the step-deviations or the new values be `color{navy}(y_i.)`

i.e. ` " " y_i = (x_i-A)/h" " or " " x_i = A+hy_i` ......................(1)

We know that `color{green}(barx =( sum_(i=1)^(n) f_i x_i)/N)` ..........................................(2)

Replacing `color{navy}(x_i)` from `(1)` in `(2),`

`" " color{navy}(barx= ( sum_(i=1)^(n) f_i(A+hy_i))/N)`

` " " = 1/N (sum_(i=1)^(n) f_i A + sum_(i=1)^(n) h f_i y_i ) `

`" " = 1/N (A sum_(i=1)^(n) f_i + h sum_(i=1)^(n) f_i y_i)`

`" " = A. N/N + h (sum_(i=1)^(n)f_i y_i)/N " " ("because" sum_(i=1)^(n)f_i= N)`

Thus `" " color{navy}(barx = A + hbary)` ..............................(3)


Now Variance of the variable `color{navy}(x, sigma_(x)^(2) = 1/N sum_(i=1)^(n)f_i (x_i-barx)^2)`

`" " color{navy}(= 1/N sum_(i=1)^(n)f_i(A+hy_i -A-h bary)^2)` (Using (1) and (3))

`" " color{navy}(= 1/N sum_(i=1)^(n)f_i h^2 (y_i-bary)^2)`

`" " color{navy}(=(h^2)/(N) sum_(i=1)^(n)f_i (y_i-bary)^2 = h^2 × "variance of the variable" \ \ y_i)`

i.e. ` color{navy}(sigma_(x)^(2) = h^2sigma_(y)^(2))`

or `color{navy}(sigma_x = hsigma_y)`....................................(4)

From (3) and (4), we have

`color{navy}(sigma_x = h/N sqrt(N sum_(i=1)^(n)f_i y_(i)^(2) - ( sum_(i=1)^(n)f_i y_i)^2))`
Q 3116278179

Calculate mean, Variance and Standard Deviation for the following
distribution.

Solution:

Let the assumed mean A = 65. Here h = 10
We obtain the following Table 15.11 from the given data :


Therefore `bar x = A + (sum f_i y_i)/50 xx h = 65 -15/50 xx 10 = 62`


Variance `σ^2 = h^2/N^2 [N sum f_i y_(i)^2 - (sum f_i y_i )^2]`

`= ( (10)^2 )/[ (50)^2) [50 xx 105 -(-15)^2 ]`

`= 1/25 [ 5250 -225 ] = 201`

and standard deviation` (σ ) = sqrt (201) = 14.18`

 
SiteLock